Rényi entropy

In information theory, the Rényi entropy, a generalisation of Shannon entropy, is one of a family of functionals for quantifying the diversity, uncertainty or randomness of a system. It is named after Alfréd Rényi.

The Rényi entropy of order α, where α $\geq$ 0, α $\neq$ 1 is defined as

$H_\alpha(X) = \frac{1}{1-\alpha}\log\Bigg(\sum_{i=1}^n p_i^\alpha\Bigg)$

where p_i are the probabilities of {x₁, x₂ ... x_n} and $\log$ is in base 2. If the probabilities are all the same then all the Rényi entropies of the distribution are equal, with H_α(X)=log n. Otherwise the entropies are weakly decreasing as a function of α.

Higher values of α, approaching infinity, give a Rényi entropy which is increasingly determined by consideration of only the highest probability events. Lower values of α, approaching zero, give a Rényi entropy which increasingly weights all possible events more equally, regardless of their probabilities. The intermediate case α=1 gives the Shannon entropy, which has special properties. When α=0, it is the maximum possible Shannon entropy, log(n).

The Rényi entropies are important in ecology and statistics as indices of diversity. The Rényi entropy also important in quantum information, it can be used as a measure of entanglement. In $XY$ Heisenberg spin chain the Rényi entropy was calculated explicitly in terms of modular function of α. They also lead to a spectrum of indices of fractal dimension.

1 H_α for some particular values of α
- 1.1 Inequalities between different values of α
2 Rényi divergence
3 Why α = 1 is special
4 Exponential families
5 References
6 See also

H_α for some particular values of α

Some particular cases:

$H_0 (X) = \log n = \log |X|,\,$

which is the logarithm of the cardinality of X, sometimes called the Hartley entropy of X.

In the limit that $\alpha$ approaches 1, it can be shown using L'Hôpital's Rule that $H_\alpha$ converges to

$H_1 (X) = - \sum_{i=1}^n p_i \log p_i$

which is the Shannon entropy.

Collision entropy, sometimes just called "Rényi entropy," refers to the case $\alpha = 2$ ,

$H_2 (X) = - \log \sum_{i=1}^n p_i^2 = - \log P(X = Y)$

where Y is a random variable independent of X but identically distributed to X. As $\alpha \rightarrow \infty$ , the limit exists as

$H_\infty (X) = - \log \sup_{i=1..n} p_i$

and this is called Min-entropy, because it is the smallest value of $H_\alpha$ .

Inequalities between different values of α

The two latter cases are related by $H_\infty < H_2 < 2 H_\infty$ . On the other hand the Shannon entropy $H_1$ can be arbitrarily high for a random variable X with fixed min-entropy.

$H_2 < 2 H_\infty$ is because $\log \sum\limits_{i = 1}^n {p_i^2 } \ge \log \sup _i p_i^2 = 2\log \sup p_i$ .

$H_\infty < H_2$ is because $\log \sum\limits_{i = 1}^n {p_i^2 } < \log \sup _i p_i \left( {\sum\limits_{i = 1}^n {p_i } } \right) = \log \sup p_i$ .

$H_1 \left( x \right) \ge H_2 \left( x \right)$ since according to Jensen's inequality $\sum\limits_{i = 1}^M {p_i \log p_i } \le \log \sum\limits_{i = 1}^M {p_i^2 }$ .

Rényi divergence

As well as the absolute Rényi entropies, Rényi also defined a spectrum of divergence measures generalising the Kullback–Leibler divergence.

The Rényi divergence of order α, where α > 0, from a distribution P to a distribution Q is defined to be:

$D_\alpha (P \| Q) = \frac{1}{\alpha-1}\log\Bigg(\sum_{i=1}^n \frac{p_i^\alpha}{q_i^{\alpha-1}}\Bigg) = \frac{1}{\alpha-1}\log \sum_{i=1}^n p_i^\alpha q_i^{1-\alpha}\,$

Like the Kullback-Leibler divergence, the Rényi divergences are non-negative for α>0. This divergence is also known as the alpha-divergence ( $\alpha$ -divergence).

Some special cases:

$D_0(P \| Q) = - \log Q(\{i�: p_i > 0\})$ : minus the log probability under Q that p_i>0;

$D_{1/2}(P \| Q) = -2 \log \sum_{i=1}^n \sqrt{p_i q_i}$ : minus twice the logarithm of the Bhattacharyya coefficient;

$D_1(P \| Q) = \sum_{i=1}^n p_i \log \frac{p_i}{q_i}$ : the Kullback-Leibler divergence;

$D_2(P \| Q) = \log \Big\langle \frac{p_i}{q_i} \Big\rangle \,$ : the log of the expected ratio of the probabilities;

$D_\infty(P \| Q) = \log \sup_i \frac{p_i}{q_i}$ : the log of the maximum ratio of the probabilities.

Why α = 1 is special

The value α = 1, which gives the Shannon entropy and the Kullback–Leibler divergence, is special because it is only when α=1 that one can separate out variables A and X from a joint probability distribution, and write:

$H(A,X) = H(A) %2B \mathbb{E}_{p(a)} \{ H(X|a) \}$

for the absolute entropies, and

$D_\mathrm{KL}(p(x|a)p(a)||m(x,a)) = \mathbb{E}_{p(a)}\{D_\mathrm{KL}(p(x|a)||m(x|a))\} %2B D_\mathrm{KL}(p(a)||m(a)),$

for the relative entropies.

The latter in particular means that if we seek a distribution p(x,a) which minimizes the divergence of some underlying prior measure m(x,a), and we acquire new information which only affects the distribution of a, then the distribution of p(x|a) remains m(x|a), unchanged.

The other Rényi divergences satisfy the criteria of being positive and continuous; being invariant under 1-to-1 co-ordinate transformations; and of combining additively when A and X are independent, so that if p(A,X) = p(A)p(X), then

$H_\alpha(A,X) = H_\alpha(A) %2B H_\alpha(X)\;$

and

$D_\alpha(P(A)P(X)\|Q(A)Q(X)) = D_\alpha(P(A)\|Q(A)) %2B D_\alpha(P(X)\|Q(X)).$

The stronger properties of the α = 1 quantities, which allow the definition of conditional information and mutual information from communication theory, may be very important in other applications, or entirely unimportant, depending on those applications' requirements.

Exponential families

The Rényi entropies and divergences for an exponential family admit simple expressions (Nielsen & Nock, 2011)

$H_\alpha(p_F(x;\theta)) = \frac{1}{1-\alpha} \left(F(\alpha\theta)-\alpha F(\theta)%2B\log E_p[e^{(\alpha-1)k(x)}]\right)$

and

$D_\alpha(p:q) = \frac{J_{F,\alpha}(\theta:\theta')}{1-\alpha}$

where

$J_{F,\alpha}(\theta:\theta')= \alpha F(\theta)%2B(1-\alpha) F(\theta')- F(\alpha\theta%2B(1-\alpha)\theta')$

is a Jensen difference divergence.

References

A. Rényi (1961). "On measures of information and entropy". Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability 1960. pp. 547–561. http://digitalassets.lib.berkeley.edu/math/ucb/text/math_s4_v1_article-27.pdf.

A. O. Hero, O.Michael and J. Gorman (2002). Alpha-divergences for Classification, Indexing and Retrieval. http://www.eecs.umich.edu/~hero/Preprints/cspl-328.pdf.

F. Nielsen and S. Boltz (2010). "The Burbea-Rao and Bhattacharyya centroids". arXiv:1004.5049.

Nielsen, Frank; Nock, Richard (2011). "On Rényi and Tsallis entropies and divergences for exponential families". arXiv:1105.3259.

O.A. Rosso EEG analysis using wavelet-based information tools. Journal of Neuroscience Methods 153 (2006) 163–182 Rényi entropy as a measure of entanglement in quantum spin chain: F. Franchini, A. R. Its, V. E. Korepin, Journal of Physics A: Math. Theor. 41 (2008) 025302 [1]

T. van Erven (2010). When Data Compression and Statistics Disagree (Ph.D. thesis). hdl:1887/15879 . Chapter 6

Frank Nielsen and Richard Nock (2012). "A closed-form expression for the Sharma-Mittal entropy of exponential families". Journal of Physics A: Mathematical and Theoretical. http://iopscience.iop.org/1751-8121/45/3/032003/.